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Abstract. The mathematical derivation of the statistics used for 
inference in some linear models assumes that values of the independent 
variables are pre-selected such that these variables can be treated as 
fixed rather than random variables. This assumption is often disregarded 
when these models are utilized in research. This study is an investiga- 
tion of the consequences of the violation of this assumption. The results 
of thi9 study indicate that when the sample size is not too small the 
consequences of the violation of this assumption are of little practical 
significance. 
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AN EMPIRICAL INVESTIGATION OP SOME EFFECTS OF THE VIOLATION OF 
THE ASSUMPTION THAT THE COVARIABLE IN ANALYSIS OF COVARIANCE 

IS A MATHEMATICAL VARIABLE 1 

Dick S. Calkins 

The University of Texas at El Paso 
and Earl Jennings 

2 

The University of Texas at Austin 

In many cases the analysis of data in Behavioral research can be 
accomplished through the formulation of a linear model vhich appears to 
represent the essential aspects of a suspected relationship between the 
independent and dependent variables being investigated. In such a model 
the data appear as variables and the statistics appear as constants vhich 
are calculated from the data. When certain mathematical procedures are 
used for calculating the values of the statistics and certain assumptions 
have been met concerning the selection and distribution of the values of 
the variables in the population from vhich the data were drawn, it can be 
shown mathematically that the statistics are "good” estimates of the para- 
meters and that accurate probability statements involving possible differ- 
ences in the parameters can be made. However , if the assumptions concern- 
ing the selection and distribution of the values of the variables in the 
population are not met, it may be very difficult to show mathematically 
how the estimates 4 the parameters and the probability statements involving 
differences in the parameters will be affected. In some instances, the 

^This paper is partially based on the Doctoral Dissertation of the 
first author. The second author served as chairman of that dissertation 
committee. 

^Dick S. Calkins is an Assistant Professor in the Department of Edu- 
cational Psychology and Guidance and a Research Consultant for the Compu- 
tation Center at The University of Texas at El Paso. Earl Jennings is an 
Associate Professor in the Department of Educational Psychology and Assis- 
tant Director of the Measurement and Evaluation Center at The University 
of Texas at Austin. 
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effects of the violation of these assumptions may he of small enough magni- 
tude that they are of no practical significance. The magnitude of the 
effects of the violation of such assumptions can be investigated empirically 
hy repeatedly sampling from populations of values with known characteristics 
when the assumptions to he investigated are not met. This study was per- 
formed to investigate the effects of the violation of one such assumption. 

For the mathematical model which is the basis for regression analysis, 
it is necessary that the continuous independent variables are mathematical 
variables (in contrast to random variables) before it can be shown that 
the statistics are "good" estimates of the parameters or that the shape 
of the sampling distribution of the statistics follows the normal distri- 
bution (Graybill, 1961, pp. 195-200, 383-396). If the sampling distribu- 
tions of the statistics are non-normal, the probability statements involv- 
ing differences in the parameters may be inaccurate. The intent of this 
study was to investigate the effects produced for a particular family of 
linear models which contain both continuous and binary coded independent 
variables when the assumption that the continuous variables are mathema- 
tical, is not maintained. 

The Models 

The linear models investigated can be utilized to alleviate a fre- 
quently occurring problem in behavioral research. This problem arises 
when an investigator desires to examine differences in existing groups 
where the differences could be attributable to some quantifiable concomi- 
tant influence. In such a situation, the investigator would probably 
want to investigate possible differences in the performance of the groups 
as measured by some dependent variable without regard to differences due 
to the concomitant variable. A logical approach to this difficulty would 
be to consider the joint frequency distribution for the dependent variable 
and the concomitant variable for each group. Comparison of the joint 
frequency distributions for tae groups in effect makes possible the com- 
parison of values of the dependent variable for individuals in the various 
groups who have the same value for the concomitant variable. 

Bottenberg and Vhrd (1963, pp* 76-86) present a family of linear models 
which can be used to make these previously mentioned comparisons in a more 
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quantitative manner. The moat general model of this family is written 

Model 1 



Y. * 
ij 



a j + ViJ * *iJ 



where Y^ is the value of the dependent variable* is the value of the 
concomitant variable* and e^ is the error associated with use of Model 1 
with the ith member of the Jth group; j = 1, 2, . . . , m where m is the 
number of groups and i = 1, 2, . • . * Uj where is the number of indivi- 
duals in the Jth group. (In this model it is assumed that within any 
populations from which the groups were selected, the expected change in 
the value of the dependent variable per unit change in the value of the 
concomitant variable is constant over the range of the values of the con- 
comitant variable. ) The evaluation of the constants in this model from 
the data produces a uuique value of a and b for each group. The value of 
a and b for each group results from fitting a regression line to the joint 
frequency distribution for the dependent variable and the concomitant 
variable for each group. The values of a and b then represent the inter- 
cept and slope of the regression line for each group. The determination 
of whether the various groups differ on the dependent variable without 
regard to differences due to the concomitant variable can be made in terms 
of the intercepts and slopes of the group regression lines. 

Probability statements involving possible differences in the intercept 
and the slope parameters for the populations from which the groups were 
selected can be made by calculation of a critical statistic which is a 
function of the error sum of squares in Model 1 and in models derived in 
particular ways from Model 1. Probability statements involving differ- 
ences in the slope parameters for the populations from which the groups 
were selected can be made on the basis of the value of a critical ratio 
which is a function of the error sum of squares (s) from Model 1 which 
can be written 
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and the error sum of squares from a model derived from Model 1 which 
restricts aH of the values of group slope to the same value. This 
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restricted model is written 



r iJ “ a J * * f i 3 



Model 2 



where Y. is the value of the dependent variable, X.* is the value of the 

i J 

concomitant variable, and e^^ is the error associated with the use of 

Model 2 with the ith member of the Jth group; J * 1, 2, . . ., m where m 

is the number of groups and i * 1 , 2 , . . . , n ^ where n^ is the number of 

individuals in the Jth group . The common value of group slope is b and 

the a's are the group intercepts. The error sum of squares (t) from 

Model 2 can be written 

m n. 2 
t*Z f 
J*=l i=l 

Probability statements involving possible differences in the intercept 
parameters for the populations from which the groups were selected, assum- 
ing that the slope parameters for these populations are equal, can be made 
on the basis of the value of a critical statistic which is a function of 
the error sum of squares (t) from Model 2 and the error sum of squares (r) 
from a model derived from Model 2 which restricts all the values of group 
intercept to the same value. This restricted model is written 



*ij - a + tx ij ♦ *ij 



Model 3 



where Y. . is the value of the dependent variable, X. 4 is the value of the 
id id 

concomitant variable, and is the error associated with the use of 

Model 3 with the ith member of the Jth group; j *= 1, 2, . . . , m where m 

is the number of groups and i s 1, 2, • . • , n^ where is the number of 

individuals in the Jth group. In Model 3, a is the value of the common 

intercept for all the groups and b is the value of the common slope for 

all the groups. The error sum of squares (r) from Model 3 can be written 

m n. 2 

r * Z 6,/ • 

j=l i*l 1J 

The extent of this research was to investigate the properties of 
Models 1, 2 and 3 and the probability statements based on these models 
when the concomitant variable was not a mathematical variable. The 
comparisons made possible by the use of these models are essentially 
those made in analysis of covariance. 
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Mathematical and Random Variables 

In the mathematical treatment of a linear model which is necessary 
in order to derive computing expression for the constants and the critical 
statistics » an important consideration is the type of- variables which are 
used in the model* The two types of variables- generally recognized by 
mathematical statisticians are random variables and mathematical variables. 

The following definition of a random variable has been adapted from 
Alexander (1961)* 

If for a particular random experiment, {A^, . * ., A^j is the set of 
outcomes (sample points) defining the sample space of the random 
experiment, and if {x^ . . . , X^} is a set of numbers such that X ± 
is associated with the corresponding outcome for i * 1, • • •» k 
then the set of values {x^, . . X^] is called a random variable 

for the particular random experiment. 

For a variable to be considered a mathematical or fixed variable, 
the values assumed by the variable must be known constants; that is, the 
values of a mathematical variable must be pre-selected from the range of 7 

possible values assumed by a random variable. For example, consider the 
case where an investigator is interested in evaluating the relative effects 
of an experimental curriculum and a control curriculum after the removal 
of the unwanted influence of difference in initial performance. If the ^ 

investigator desired to treat his assessment of initial performance as a 
fixed variable in Models 1, 2 and 3, it would be necessary for him to 
select certain values of initial performance before he tested the pupils 
and then to use in the analysis only the pupils who had those specific 
values of initial performance. If he wished to treat initial performance 
as a random variable in the models, he could simply assess the initial 
performance of all the available pupils and use their scores regardless 
of particular values. 

Another important consideration involving the variables which appear 
in a linear model is concerned with the amount of error inherent in the 
process of observation of the values of the variable. Whether a variable 
is treated as fixed or random, the process by which the values of the 
variables are observed usually introduces some error of measurement. The 
relative magnitude of the error introduced by the measurement process is 
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generally used a priori to determine whether the observed values of a 
variable will be treated in the model as measured with error or as error 
free. For example, it would probably not be reasonable to conclude that 
observed initial performance is an errorless assessment of a given sub- 
ject’s potential. 

The mathematical derivation of the sampling distributions of the 
estimates of the unknown parameters and the properties of the critical 
ratios for linear models involve both the .type of the variables and 
whether or not the variables are considered to be measured with or with- 
out error. In the mathematical treatment of models of the same type that 
are considered here, it is assumed that the X's are both fixed and measured 
without error (Graybill, 1961, pp. 103-104 and 383). Berkson (1950) has 
shown mathematically that if the X values are fixed variables but measured 
with error, the probability statements based on the critical ratio and the 
sampling distributions of the statistics are not effected. In considera- 
tion of further comments by Graybill associated with the assumptions 
underlying the various models and the associated mathematical development 
of the models which do consider various cases where the X’s are treated 
as random variables both with and without error, it becomes apparent that 
a general solution to this problem is both difficult and unavailable. 

There are instances in the natural and behavioral sciences when it is 
no problem to design experiments such that a concomitant variable is a 
fixed variable measured with very little error. For instance, if temper- 
ature were considered to be a variable which was critically affecting the 
comparison of yield in two or more manufacturing processes, all the pro- 
cesses could be utilized a given number of times at pre-selected values of 
temperature and then differences in the yields of the processes could be 
evaluated using Models 1, 2 and 3 to make possible comparison of the yield 
of the process with the effect of temperature removed. In the social 
sciences, if practice in an experiment concerning the effects of rein- 
forcement on performance were thought to influence the comparison of 
performance for the various reinforcement conditions, there would be 
little problem involved in selecting certain amounts of practice and then 
assessing performance for a certain number of individuals for each rein- 
forcement condition at the selected amounts of practice. With amount of 
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practice a fixed variable measured with little error, use of Models 1, 

2 and 3 to compare the performance of the 'various reinforcement conditions 
with the effect of practice removed is in correspondence with the assump- 
tions for these models. 

Unfortunately, the use of Models 1, 2 and 3 in the educational example 
when the concomitant variable is fixed is not a very satisfactory procedure 
because of the related problems of obtaining sufficient subjects who have 
the necessary scores on the concomitant variable within the constraints 
of the experimental situation. This problem is compounded by the diffi- 
culty involved in obtaining relatively errorless values of the concomitant 
variable. What usually occurs in actual practice is that the investigator 
ignores the requirement that, the concomitant variable be fixed and error 
free and proceeds with the analysis as if this variable were fixed. The 
purpose of this study was to investigate the effects of the failure to 
meet these assumptions of the model when the X values are values of a 
random variable and are measured without error. This was accomplished 
by determining whether differences in the number of incorrect decisions 
based on the critical statistic for differences in the b's in Model 1 and 
the critical statistic for differences in the a's in Model 2 occur when X 
is a random variable rather than a mathematical variable. Also, compari- 
sons were made between the distributions of the b.*s in Model 1 and of the 
a's in Model 2 when the X values were values of a random variable and 
values of a mathematical variable. The case when the X values represent 
values of a random variable measured with error was not treated in this 
study. 

Methods 

In order to conduct this empirical investigation, computer programs 
were written in FORTRAN to be run on the CDC 6600 Computer System at The 
University of Texas at Austin. These computer programs, which are shown 
in Calkins (1971) » allowed values of Y to be randomly selected from var- 
ious bivariate populations having predetermined parameters for values of 
X either fixed or randomly obtained. The specified characteristics of 
the bivariate populations were the type of bivariate distribution and the 
means and variances of the X and Y marginal distributions, (in actuality. 
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it was difficult to maintain constant variance when the X values vere 
fixed. ) Factors of the investigation which were varied are number of 
cases (X,Y pairs) sampled from each group, shape of the X marginal dis- 
tribution from which values of X were selected, and the variance of the 
Y values in each X array. The principal statistics which were observed 
are the distributions and expected values of the Vs and Vs, and the 
critical statistics based on possible differences in the Vs and Vs. 

In general, the X and Y marginal distributions of the bivariate 
frequency distributions had means of 50.0 and standard deviations of 
10.0, and correlations of 0.15, 0.30, 0.1*5* 0.60, 0*75 and 0.90 were 
used to determine the variance of the Y values for each X array. Sam- 
ples of size five, 13 and 39 were used in experiments of 1,000 samoles. 

The shapes of the X marginal distributions which were used are normal 
distributions and rectangular distribtuions . The corresponding types 
of bivariate distributions which were used are bivariate normal and the 
values of Y normally distributed for each value of X used but with all 
the Y arrays having equal variance. 

Measurement of the Effects 

Critical features of the various sampling distributions of the sta- 
tistics were used to compare the effects of fixed and random selection 
of X values. The distributions of the Vs and the Vs vere compared 
with their counterparts through the use of functions of the first four 
cumulative moments of their respective distributions. These statistics — 
mean, variance, skewness and kurtosis — vere calculated using the com- 
puting expressions from Fisher (1958). It was expected that these statis- 
tics would closely approximate the population values. 

The F statistic is the critical statistic which was investigated. 

The F statistic upon which decisions concerning possible differences in 
the Vs or slopes in Model 1 was denoted F^» and the F statistic upon 
which decisions concerning possible differences in the Vs or intercepts 
in Model 2 was denoted F . A concise presentation of a procedure for the 
calculation of values of these statistics was adanted from Bottenberg 
and Ward (1963, pp. 76-86), although for the actual computations of these 
values, computing expressions from Winer (1962, pp. 578-588) were used. 
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is the error from Model 1 
associated with individual 
i, from group J. 
is the error from Model 2 
associated with individual 
i_ from group J.# 
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and g 



ij 



m 

2 n, - 1 - 1 
J«1 J 

is the error from Model 3 
associated with individual 
i from group A* 



Since in this study all the samples for the groups were drawn from 
the same population, the expected values of F & should be near df,,/ (df^ - 2) 
and F b should be near df^/(df^ - 2). Also, not more than five oercent of 
the Values calculated for F and F should be equal to or greater than the 

3» 0 

specific values of the central F distribution for the proper degrees of 
freedom at the .05 confidence level. Departure from what is expected for 
either of these criteria would indicate that the values of the critical 
statistic are not F distributed, although the latter ohe<ik is the more 
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important 9 since departure from what is expected would upset the decision 
rule. 

Sampling Procedure 

In Monte Carlo studies such as this one, where sampling must he done 
from a joint frequency distribution representing a population of indivi- 
duals, problems concerning efficient computer usage arise because large 
blocks of computer storage would be required to contain the frequency 
distribution. For this reason and other aspects of efficiency, actual 
frequency distributions were not used in this study. Instead of actually 
sampling from existing frequency distributions, random deviates were 
generated using computer programs based on pseudo random numbers such that 
these random deviates simulated sampling with replacement from distribu- 
tions with desired characteristics. 

The source of pseudo random numbers for this study was RANF , a 
FORTRAN function, which is available through the CDC 6600 computer system 
and documented in the computation center User*s Manual of The University 
of Texas at Austin. The algorithm by which these pseudo random numbers 
were generated appears sufficient, for the purposes of this study, to 
consider the pseudo random numbers to be random. This function was 
utilized such that the same sequence of random numbers was used in each 
experiment . 

The random numbers generated by RANF were used in two other functions, 
RNORMD and RANREC, to generate numbers which were random deviates of a 
normally distributed variable with a specified mean and variance in the 
case of RNORMD and random deviates from a rectangularly distributed 
variable with a specified mean and variance in the case of RANREC. 

^Actually these so-called pseudo random numbers from RANF can be 
viewed as random samples from a continuous rectangular distribution which 
is defined only over the range zero to one. For purposes of this study 
deviates are defined to be random samples from distributions with speci- 
fied characteristics which differ from the characteristics of the distri- 
bution inherent in RANF. Thus the numbers obtained from RANF are called 
random numbers and all numbers used in this study which are functions of 
the numbers obtained from RANF are called random deviates* 
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The random deviates from RNORMD and RMREC were used to produce 
numbers which were themselves random deviates from bivariate frequency 
distributions. (In the following discussion of the procedure used in the 
generation process, it may be helpful for the reader to refer to Figure 1.) 
Random deviates for a bivariate normal frequency distribution were generated 
by first obtaining a random normal value of X from a univariate distribu- 
tion with a mean of 50.0 and a standard deviation of 10.0 by using RNORMD. 
The corresponding Y value is a random deviate from a normal distribution 
with a mean equal to the predicted value of Y for the particular value of 
X and a s tandard de viation which is the stadard error of the estimate 

(S * S for predicting Y from a knowledge of X. This random 

yx y yx 

value of Y was thus generated by again using RNORMD to obtain a random 
normal deviate from a distribution with a mean equal to A ♦ BX and a stan- 
dard deviation equal to S where X is the previously generated deviate 

and A, B and S are values calculated from the parameters specified for 
yx 

the bivariate frequency distribution. This pair of X and Y values then 

* 

represents a random deviate from a bivariate normal frequency distribution 
with specified X and Y- univariate means and standard deviations and 
bivariate correlation. 

For the case where it is necessary to obtain deviates from a bivariate 
normal frequency distribution for normally distributed but fixed values of 
X, the procedure for the generation of the Y values was the same as for 
generation of the random values of X but the procedure for obtaining the 
X values was different. Thirteen fixed values of X were chosen. These 
values were the mean of the X marginal distribution and six equally spaced 
values above and below this mean. The spacing of these values was deter- 
mined in terms of the value of the standard deviation of the X marginal 
distribution such that these 12 values were equal to the mean plus or minus 
• 5, 1.0, 1.5, 2.0, 2.5* and 3.0 times the standard deviation. The fre- 
quency of occurrence of each of these fixed values was used to determine 
the shape of the X marginal distribution. In order that the X marginal 
distribution be normally distributed, values of the probability function 
of the normal curve were obtained from a z. score table using the X values 
in £ score form as arguments. Since the height of the probability function 
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(the univariate mean and 
standard deviation of X) 

(the univariate mean and 
standard deviation of Y) 

(the bivariate correlation 
between X and Y) 



S Is the standard error Involved with estimating 
V* Y from the knowledge of X 
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Is fit to the Joint distribution of X and Y 
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Obtain a random deviate 
distributed normally 
with mean * M and 
standard deviation * SO 
with RNOfttO. This value 
Is X. 



Obtain a fixed value of 
X such that the mean of 



the X's will be M , 
deviation 
be SD X and the X's 
be normal ly dlstrl- 
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wl 1 1 
will 



Obtain a random deviate 
distributed rectangu- 
larly with mean « M x 
and standard deviation 
* S0 X with RANREC. This 
value Is X. 



Obtain a fixed value of X 
such that the mean of all 
the X's will be M x# the 
standard deviation will be 
SD X and the X's will be 
rectangularly distributed. 
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Calculate M yx (the mean of the 
Y's for a particular value of X) 
= X • B ♦ A 



M 



yx 



Obtain a value for Y which Is a random deviate distributed normally 
with a mean = M yx and a standard deviation » S yx with RNORMO. 



Figure 1. Flowchart showing the methods for obtaining the X and Y values 

for each of fhe various configurations of the bivariate frequency distributions. 
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of the normal curve for a particular z, score can be interpreted as a 
proportion of the total number of cases occurring for that score , the 
number of cases needed for any value of X for this particular selection 
of 13 a, scores can be obtained by multiplying one-half the desired sample 
size by the height of the probability function of the normal curve for 
that X in z score units. The Y value corresponding to this value of X 
was then generated as in the random bivariate normal case. This pair of 
X and Y values then represents a deviate from a bivariate normal frequency 
distribution based on a fixed value of X. 

For the case where it is necessary to obtain deviates from a bivar- 
iate frequency distribution for rectangularly distributed values of X 
but for the Y values normally distributed for each value of X, again the 
procedure for the generation of the Y values is the same as in the two 
previous cases. The random values of X were obtained by using RAHBEC to 
generate random deviates from a rectangular univariate distribution with 
a mean of 50.0 and a standard deviation of 10.0. The rectangularly dis- 
tributed values of X were obtained in a manner analogous to the procedure 
previously described for obtaining normally distributed fixed values, except 
that in the rectangular case a rectangular probability function was util- 
ized rather than the probability function for the normal curve. 

However, it should be noted that when this procedure for both normal 
and rectangular distributions is used to establish the frequency of occur- 
rence of the fixed X values, it is difficult to maintain both the system 
of intervals between the 13 fixed values of X and a given standard devia- 
tion of the X values. For this reason, in the fixed case the intervals 
between the fixed X values were maintained and the standard deviations 
of the X marginal distribution for different sample sizes were allowed to 
vary. For fixed but normally distributed values of X, the standard devia- 
tion of the X values was 7.07 for sample size five, 8.-55 for sample size 
thirteen and 9 .U 7 for sample size thirty-nine. For fixed but rectangularly 
distributed values of X, the standard deviation of the X values was 7.07 
for sample size five, 18.71 for samnle size thirteen and 18.71 for sample 
size thirty-nine. 
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The remainder of the procedure for all cases consisted of generating 
the necessary number of X,Y pairs with the appropriate characteristics 
for the desired number of cases per. group, accumulating the various sums, 
sums of squares and sums of products and then utilizing these figures in 
the computing formulas to produce sample values of the slope, intercept, 
standard error of slope and intercept, and critical statistics for the 
slope and intercept. This entire procedure was then repeated one thousand 
times in order to obtain sampling distributions for the slope, intercept, 
and the critical statistics of the slope and intercept. 

Results 

The results are presented in Tables 2 through 7* Tables 2 and 3 
represent the results of the experiments involving the investigation of 
the effects of violation of the assumption that the concomitant variable 
is a fixed or mathematical variable. Table 1 presents the legend necessary 
to interpret Tables 2 and 3. Tables h through 7 were prepared from the 
information contained in Tables 2 and 3 to aid in the interpretation of 
Tables 2 and 3. 

Table 1 is an explication of the two and three letter codes which 
identify the various statistics reported for each experiment shown in 
Tables 2 and 3. It should be noted that the reported statistics for each 
experiment contain both expected and observed values pertaining to inter- 
cept and slope. The first s&t of five statistics refers to various 
expected and observed values of the distribution of slopes and the second 
set of five statistics refers to the same values of the distribution of 
intercepts. The next three statistics refer to various observed and 
expected values of the critical ratio related to differences in slope and 
the next three statistics refer to the same values except that they relate 
to differences in intercept. 

Tables 2 and 3 show some of the expected and obtained values of the 

distributions of slope, intercept and critical statistics of the slope and 

« « 

intercept for the two types of bivariate distributions. The X's were 
selected with both fixed and random values for six values of standard 
errors of estimate based on the values of correlation shown for three 
sample sizes and two groups. Tables k through 7 contain summary infor- 
mation from Tables 2 and 3 concerning the discrepancy between the observed 
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Legend of Alphabetic Codes Needed to Interpret Tables 3 through 7 



ES 



OS 

SOS 

ss 

KS 

El 



01 

SOI 

$1 

K1 

ODS 

EFS 

OFS 

0D1 



- the expected mean of the theoretical sampling distribution of slope values which 
is calculated from the specified parameters by slope = r . S 



where r is the specified correlation 

Sy is the standard deviation of the Y marginal distribution 

S x is the standard deviation of the X marginal distribution. 
- the mean of the distribution of observed values of slope 



- the standard deviation of the distribution of the observed values of slope 



- the skewness of the distribution of the observed values of slope 

- the kurtosis of the distribution of the observed values of slope 

- the expected mean of the theoretical sampling distribution of intercept values 

which is calculated from the specified parameters by intercept = - slope* 

where M is the mean of the X marginal distribution 
where is the mean of the Y marginal distribution. 

- the mean of the distribution of the observed values of intercept 

- the standard deviation of the distribution of the observed values of intercept 

- the skewness of the distribution of the observed values of intercept 

- the kurtosis of the distribution of the observed values of intercept 

- the number of observed £ values based on differences in slope which are greater 
than the specified value of the central £ distribution for the proper degrees 
of freedom at the .05/ .01 confidence level 

- the expected mean of the central £ distribution for the proper degrees of 
freedom for differences in slope 

- the mean of the observed distribution of £ values based on difference in slope 

- the number of observed £ values based on differences in intercept which are 
greater than the specified value of the central £ distribution for the proper 
degrees of freedom at the .05/. 01 confidence level. 



/ 



V 

\ 



EF1 - the expected mean of the central £ distribution for the proper degrees of freedom 
for difference in intercept 



0F1 - the mean of the observed distribution of £ values based on differences in 
Intercept 
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VALUES AND MEANS FOR THE DISCREPANCY BETWEEN THE EXPECTED AND OBSERVED VALUES OF INTERCEPT 
CONSIDERING POSSIBLE EFFECTS DUE TO DISTRIBUTION SHAPE, SAMPLING PROCEDURES, STANDARD ERROR OF ESTIMATE 

AND SAMPLE SIZE 
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VALUES AND MEANS FOR NUMBER OF OBSERVED F VALUES SIGNIFICANT AT THE .05/01 LEVEL 
BASED ON DIFFERENCES IN INTERCEPT CONCERNING POSSIBLE EFFECTS DUE TO DISTRIBUTION SHAPE, SAMPLING 

PROCEDURE, STANDARD ERROR OF ESTIMATE AND SAMPLE SIZE 
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and expected mean values of the distributions of intercept and slope and 
the number of observed F values based on differences in interceot and 
differences in slope significant at the .05 and .01 levels. 

Although there usually appears to be some discrepancy between what 
is observed and what is expected in Tables 2 through 7* these discrepan- 
cies are generally of a small magnitude. Some results detectable in 
these tables are outlined below. 

A. Effects on the moments of the distributions of intercept and slope 

1. Discrepancies between what is expected and observed for the 
mean values of the distributions of slope shown in Tables 
2, 3 and U 

a. Distribution shape and sampling procedure appear to be 
related to the discrepancy of the slope in a complex 
manner. The random normal case tends to over-estimate 
the population slope with the highest absolute discrepancy 
while the random rectangular, fixed normal and fixed rec- 
tangular cases tend to underestimate the population 

slope in the above order of increasing absolute discrepancy. 

b. Sample size appears to be related to discrepancy for 
slope with smaller absolute discrepancy associated with 
larger sample size except for the fixed normal case for 
sample size thirteen. 

c. Standard error of estimate appears to be related to 
discrepancy for slope with least absolute discrepancy 
occurring with smaller values of standard error of 
estimate (higher correlation). 

2. Discrepancies between what is expected and observed for the 
mean values of the distributions of intercept shown in 
Tables 2, 3 and 5 . 

a. Distribution shane and sampling procedure anpear to be 
related to the discrepancy of the intercepts in a com- 
plex manner. The random normal case tends to underesti- 
mate the population intercept with the lowest absolute 
discrepancy while the fixed rectangular » the fixed 
normal and random rectangular cases tend to overestimate 
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the population intercept in the above order of increasing 
absolute discrepancy. 

b. Sample size appears to be related to the discrepancy for 
intercepts with smaller absolute discrepancy associated 
with larger sample size except for the fixed normal case 
for sample size thirteen. 

c. Standard error of estimate appears to be related to dis- 
crepancy with least absolute discrepancy occurring with 
the smaller values of standard error of estimate (higher 
correlation). 

3* Discrepancies between what is expected and observed for the 
skewness and kurtosis of the distributions of intercept and 
slope shown in Tables 2 and 3 

a. The skewness and kurtosis of the distributions of inter- 
cept and slope only appear to differ substantially from 
what was expected for the random case for sample size 
five. For this case the skewness and kurtosis appear 
to be related to the values of standard error of esti- 
mate with larger values of skewness and kurtosis associ- 
ated with lower values of standard error. The same effect 
also appears to a smaller degree for the random case when 
the sample size is thirteen. 

B. Effects on the discrepancy between the observed and expected 
mean values of the distributions of F values for intercept and 
slope shown in Tables 2 and 3 

1. The discrepancies between what is expected and observed do 
not appear to be systematically related to 

a. Distribution shape or sampling procedure 

b. Sample size 

c. Value of standard error of estimate 

C. The effects on the critical statistics for decisions concerning 
differences in slope and intercept 

1. Discrepancies between what is expected and observed for the 
number of F values based on differences in slope significant 
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at the .05 and .01 levels shovn in Tables 2, 3 and 6 

a. Distribution shape and sampling procedure appear to be 
related to the number of observed £ values significant 
at the .05 and .01 levels in a complex manner. At the 
.05 level , the order of average highest occurrence and 
percent of error relative to the expected value of 
fifty is random normal (15# error), fixed rectangular 
(8# error), fixed normal (3# error), and random rectan- 
gular (9# error). At the .01 level the order of highest 
average occurrence of the F values and percent eTro r 
relative to the expected value of ten is fixed rectan- 
gular (31# error), random normal (U# error), fixed 
normal (5# error), and random rectangular (10# error). 

b. . Standard error of estimate does not appear to be sys- 

tematically related to the number of significant F values 
at the .05 and .01 levels. 

c. Sample size does not aopear to be systematically related 
to the number of significant F values at the .05 and .01 
levels. 

j 

2. Discrepancies between what is expected and observed for the 
number of F values based on differences in intercept signi- 
ficant at the .05 and .01 levels shown in Tables 2, 3 and 7 
a. Distribution shape and sampling procedure appear to be 
related to the number of observed F values significant 
at the .05 and .01 levels in the following manner. At 
the .05 level, the order of average highest occurrence 
and percent error relative to the expected value of fifty 
is random normal (lU# error), random rectangular ( 6 % error), 
fixed normal (5# error), and fixed rectangular (1# error). 

At the .01 level, the order of average highest occurrence 
and percent error relative to the expected value of ten 
is random normal (7# error), random rectangular (1# error), 
fixed normal (2% error), and fixed rectangular (9% error). 
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b. Standard error of estimate does not appear to be 
systematically related to the number of significant 
F values at the .05 and .01 levels. 

c. Sample size does not appear to be systematically related 
to the number of significant F values at the .05 and .01 
levels. 

In summary, one may say that although few of the observed values are 
exactly the ones expected, generally these differences are of small mag- 
nitude. The difficulties which were predicted concerning non-normality 
of the distributions of intercept and slope for the random case do appear 
for small sample sizes. However, the critical features of intercept and 
slope do not appear to be different enough from what is expected to merit 
concern for practical purposes. Also the lack of normality does not seem 
to cause a sufficient increase in the number of type one errors for deci- 
sions about differences in intercepts and slopes to merit concern for 

practical purposes. / 

Conclusions 1 

In the mathematical treatment of linear models* certain of the 
independent variables are assumed to be fixed variables (Graybill, 1961). * 

When this assumption is made* it can be shown that the computed estimates *y 

of the parameters occurring in the models have the desirable character! s- ' 

tics that they are ’’good" estimates and are normally distributed. 

It is often convenient to overlook this assumption when linear 
models are utilized in a research situation, since the nature of a variable 
often does not allow the researcher to select cases with particular values 
of a variable without discarding large amounts of data. This empirical 
study was undertaken as an attempt to discover the effects produced for 
the computed statistics and for the decisions made on the basis of a 
critical ratio concerning differences in these statistics for a particular 
family of linear models when certain independent variables are not fixed 
variables. 

The assumption that certain independent variables have fixed values 
can be violated in one or both of two ways? (1) the researcher can fail 
to pre-select values of the variables which will be found in the data 
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and utilized to estimate the statistics, and/or (2) measurement error can 
he introduced through the process of observing the values of the indepen- 
dent variables. Berkson (1950) has shown that if for certain linear 
models the values of the independent variables are only allowed to assume 
fixed values, the values of these variables can be observed with error 
without disturbing the mathematically desirable distributional character- 
istics of the estimates of the unknown parameters. The present study 
focused on the effects produced for only one of the two cases of violation 
of this assumption for which no mathematical solutions are available. 

The case investigated occurs when certain independent variables are not 
fixed but are observed without error. 

The particular family of linear models investigated are a set 
presented by Bottenberg and Ward (1963* pp. 76-86), who apply them to 
problems generally approached by the use of the technique of ana l y sis 
of covariance. The estimable terms in these models represent the esti- 
mates of the parameters of intercepts and slopes of group regression lines. 

ibis study investigated the following consequences of the departures 
from the assumption for persons doing research! (1) the estimates of the 
slope and intercept parameters are not M good, M (2) the distributions of 
the values of these estimates are non-normal, and/or (3) the number of 
erroneous decisions concerning differences in the estimates are greater 
than expected. 

Some difficulty was encountered in interpreting the results of this 
study because differences between what is observed and what is expected 
may be due to sampling fluctuation or to systematic fluctuation of a 
subtle nature caused by varying the various factors in the experiments. 

The results are probably not rigorous enough to please a mathematical 
statistician. However, these discrepancies were of a smal l enough 
magnitude and the values selected for the various factors were probably 
general enough that the results are of practical value. 

In general, the results of this study indicate that the violation of 
the assumption that the independent variable is fixed does not produce 
enough disparity between what is expected and what is observed for any 
of the previously mentioned consequences to be a problem for persons 
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doing research* Thus it seems reasonable to conclude that for this set 
of linear models* within the confines of the values of the factors selected 
for this study and for sample sizes not too small (in the neighborhood of 
thirteen and greater), the effects of the violation of the assumption that 
the independent variable is fixed present little or no problem for 
researchers* 
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